Guess Again and See if They Line up: Surrey's Runs at Plagiarism Detection Notebook for PAN at CLEF 2013
نویسنده
چکیده
This paper briefly describes the approaches taken to the two subtasks of Source Retrieval and Text Alignment, in the Plagiarism Detection track at PAN 13. For the first of these, we reuse our PAN12 approach – which combines frequency and a contrastive corpus measure to select keywords for querying the ChatNoir search system; for the second, we reuse software that had previously featured in PAN11 and PAN12. We comment on how effective both approaches were, and what steps should be taken if the competition remains substantially similar next time.
منابع مشابه
Diverse Queries and Feature Type Selection for Plagiarism Discovery Notebook for PAN at CLEF 2013
This paper describes approaches used for the Plagiarism Detection task in PAN 2013 international competition on uncovering plagiarism, authorship, and social software misuse. We present modified three-way search methodology for Source Retrieval subtask and analyse snippet similarity performance. The results show, that presented approach is adaptable in real-world plagiarism situations. For the ...
متن کاملDeveloping Monolingual Persian Corpus for Extrinsic Plagiarism Detection Using Artificial Obfuscation: Notebook for PAN at CLEF 2015
The task of text alignment corpus construction at PAN 2015 competition consists of preparing a plagiarism corpus so that it can provide various obfuscation types and versatile obfuscation degrees. Meanwhile, its format and metadata structure should follow previous PAN plagiarism corpora. In this paper, we describe our approach for construction of a monolingual Persian plagiarism corpus that can...
متن کاملEvaluating Robustness for 'IPCRESS': Surrey's Text Alignment for Plagiarism Detection
This paper briefly describes the approach taken to the subtask of Text Alignment in the Plagiarism Detection track at PAN 14. We have now reimplemented our PAN12 approach in a consistent programmatic manner, courtesy of secured research funding. PAN 14 offers us the first opportunity to evaluate the performance/consistency of this re-implementation. We present results from this re-implementatio...
متن کاملExternal & Intrinsic Plagiarism Detection: VSM & Discourse Markers based Approach - Notebook for PAN at CLEF 2011
This paper aims to explain the performance of plagiarism detection system which can detect External as well as Intrinsic Plagiarism in text. It reports the results on PAN-PC-2011 test corpus. We investigated Vector Space Model based techniques for detecting external plagiarism cases and discourse markers based features to detect intrinsic plagiarism cases.
متن کاملApproaches for Source Retrieval and Text Alignment of Plagiarism Detection Notebook for PAN at CLEF 2013
In this paper, we describe our approach at the PAN@CLEF2013 plagiarism detection competition. In sub-task of Source Retrieval, a method combined TF-IDF, PatTree and Weighted TF-IDF to extract the keywords of suspicious documents as queries to retrieve the plagiarism source document is proposed. In sub-task of Text Alignment, a method based on sentence similarity is presented. Our text alignment...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013